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Abstract 



(N 

^. 

CD . We give a new proof of the classical Central Limit Theorem, in the 

MaUows (L'^-Wasserstein) distance. Our proof is elementary in the 
sense that it does not require complex analysis, but rather makes use 
of a simple subadditive inequality related to this metric. The key is to 
analyse the case where equality holds. We provide some results con- 
cerning rates of convergence. We also consider convergence to stable 



'i> • distributions, and obtain a bound on the rate of such convergence. 

X 

1 Introduction and main results 

The spirit of the Central Limit Theorem, that normalised sums of indepen- 
dent random variables converge to a normal distribution, can be understood 
in different senses, according to the distance used. For example, in addition 
to the standard Central Limit Theorem in the sense of weak convergence, 
we mention the proofs in Prohorov (1952) of L^ convergence of densities, in 
Gnedenko and Kolmogorov (1954) of L°° convergence of densities, in Barron 



(1986) of convergence in relative entropy and in Shimizu (1975) and Johnson 
and Barron (2004) of convergence in Fisher information. 

In this paper we consider the Central Limit Theorem with respect to the 
Mallows distance and prove convergence to stable laws in the infinite variance 
setting. We study the rates of convergence in both cases. 

Definition 1.1 For any r > 0, we define the Mallows r-distance between 
probability distribution functions Fx and Fy as 

l/r 

dJFx,FY) = ( inf E\X-Y\ 

\{X,Y) 

where the infimum is taken over pairs {X, Y) whose marginal distribution 
functions are Fx and Fy respectively, and may be infinite. Where it causes 
no confusion, we write dr{X,Y) fordr{Fx,Fy). 

Define J> to be the set of distribution functions F such that J \x\^dF{x) < 
cxD. Bickel and Freedman (1981) show that for r > 1, dr is a metric on JF^. 
If r < 1, then d^ is a metric on J>. In considering stable convergence, we 
shall also be concerned with the case where the absolute rth moments are 
not finite. 

Throughout the paper, we write Z^^„2 for a N{fi, o"^) random variable, 
Z„2 for a A^(0, a^) random variable, and ^^,a^ and $0-2 for their respective 
distribution functions. We establish the following main theorems: 

Theorem 1.2 Let Xi, X2, . . . be independent and identically distributed ran- 
dom variables with mean zero and finite variance cr^ > 0, and let Sn = 

(Xi + ... + X„)/v^. Then 

lim d2{Sn,Z„2) = 0. 

n— >oo 

Moreover, Theorem IH.2I shows that for any r > 2, if dr{Xi,Z„2) < cxo, 
then lim„_»oo c?r(>S'„, ^0-2) = 0. Theorem 11.21 implies the standard Central 
Limit Theorem in the sense of weak convergence (Bickel and Freedman 1981, 
Lemma 8.3). 

Theorem 1.3 Fix a G (0,2), and let Xi,X2,... be independent random 
variables (where EXj = 0, if a > 1), and Sn = {Xi + . . . +Xn)/n^^°'- If there 



exists an a-stable random variable Y such that supj dp{Xi, "K) < oo for some 
(3 G (a, 2], then lim„^oo d/3{Sn, Y) = 0. In fact 

21//3 / " ^ '/^ 

MSn,Y) < -^ iY.d^,iX,,Y) 

so in the identically distributed case the rate of convergence is 0{n^^^^^^"). 

See also Rachev and Riischendorf (1992,1994), who obtain similar results 
using different techniques in the case of identically distributed Xj and strictly 
symmetric Y. In Lemma 15.31 we exhibit a large class Ck of distribution 
functions Fx for which d^lX, Y) < K, so the theorem can be applied. 

Theorem 11.21 follows by understanding the subadditivity of d2{Sn,Z^2) 
(see Equation (JH)). We consider the powers-of-two subsequence T^ = 5*2*, 
and use Renyi's method, introduced in Renyi (1961) to provide a proof of 
convergence to equilibrium of Markov chains; see also Kendall (1963). This 
technique was also used in Csiszar (1965) to show convergence to Haar mea- 
sure for convolutions of measures on compact groups, and in Shimizu (1975) 
to show convergence of Fisher information in the Central Limit Theorem. 
The method has four stages: 

1. Consider independent and identically distributed random variables Xi 
and X2 with mean /i and variance cr^ > 0, and write D{X) for d\{X^ ^/i,o-2 
In Proposition 12.41 we observe that 

D (^^) < D(X,), (1) 

with equality if and only if Xi,X2 ~ Z^^-z. Hence D{Tk) is decreasing 
and bounded below, so converges to some D. 

2. In Proposition 12.51 we use a compactness argument to show that there 
exists a strictly increasing sequence kr and a random variable T such 
that 

limD(T,J=D(T). 



Tk.+Ti\ r^fT + r 



Further, 

hm D(T,„+i) = hm D I ^^^-=^ )=D 
where the T^^ and T' are independent copies of T^^ and T respectively. 



3. We combine these two results: since DiT^^) and DiT^^+i) are both 
subsequences of the convergent subsequence D(Tk), they must have a 
common hmit. That is, 

,'T + T' 
D = D{T) =D 



V2 

so by the condition for equahty in Proposition 12.41 we deduce that 
T~iV(0,a2) andD = 0. 

4. Proposition 12.41 imphes the standard subadditive relation 

(m + n)D{Sm+n) < mD{S.m) + nD{Sn). 

Now Theorem 6.6.1 of Hille (1948) implies that D{Sn) converges to 
inf„D(^„) = 0. 

The proof of Theorem 11.31 is given in Section 

2 Subadditivity of Mallows distance 

The Mallows distance and related metrics originated with a transportation 
problem posed by Monge in 1781 (Rachev 1984, Dudley 1989, pp.329-330). 
Kantorovich generalised this problem, and considered the distance obtained 
by minimising Kc{X, Y), for a general metric c (known as the cost function), 
over all joint distributions of pairs (X, Y) with fixed marginals. This distance 
is also known as the Wasserstein metric. Rachev (1984) reviews applications 
to differential geometry, infinite-dimensional linear programming and infor- 
mation theory, among many others. Mallows (1972) focused on the metric 
which we have called d2, while di is sometimes called the Gini index. 

In Lemma [2.31 below, we review the existence and uniqueness of the con- 
struction which attains the infimum in Definition 11.11 using the concept of a 
quasi-monotone function. 

Definition 2.1 A function A; : M^ — >• M induces a signed measure Hk on M^ 
given by 

/ifc {(x, x'] X {y, y']} = k{x, y) + k{x', y') - k{x, y') - k{x\ y). 

We say that k is quasi-monotone if ^k is a non-negative measure. 



The function k{x,y) = —\x — yl"^ is quasi-monotone for r > 1, and if r > 1 
then the measure /i^ is absolutely continuous, with a density which is positive 
Lebesgue almost everywhere. Tchen (1980, Corollary 2.1) gives the following 
result, a two-dimensional version of integration by parts. 

Lemma 2.2 Let k{x,y) be a quasi-monotone function and let Hi{x,y) and 
H2{x,y) be distribution functions with the same marginals, where Hi{x,y) < 
H2{x, y) for all x, y. Suppose there exists an Hi- and H2- integrable function 
g{x,y), bounded on compact sets, such that k{x^,y^) < g{x,y), where x^ = 
{-B)y xAB. Then 

Here Hf{x,y) = F(X < x,Y < y), where {X,Y) have joint distribution 
function Hi. 

Lemma 2.3 Forr > 1, consider the joint distribution of pairs {X,Y) where 
X and Y have fixed marginals Fx and Fy, both in Tr- Then 

E|x-r|^ >E|x*-y*|^ (2) 

where X* = F^\U), Y* = Fy\U) and U ~ U{0, 1). For r > 1, equality is 
attained only if (X, Y) ~ (X*, F*). 

Proof Observe, as in Frechet (1951), that if the random variables X, Y have 
fixed marginals Fx and Fy, then 

nX<x,Y<y)<H+{x,y), (3) 

where H^{x,y) = min(Fx(a;), Fy(y)). This bound is achieved by taking 
U ~ C/(0, 1) and setting X* = F^\U), Y* = Fy\U). 

Thus, by Lemma l2.2t with k{x,y) = —\x — y\^, for r > 1, and taking 
Hi{x, y) = P(X < x,Y < y) and H2 = H+, we deduce that 

E\X -Yl"- -E\X* -Y*\' = J {H4x,y)-Hi{x,y)}dfik{x,y)>0, 
so (X*, Y*) achieves the infimum in the definition of the Wasserstein distance. 



Finally, since taking r > 1 implies that the measure /i^ has a strictly pos- 
itive density with respect to Lebesgue measure, we can only have equality in 
(J2I) if f{X < x,Y < y) = m.m{Fx{x),FY{y)} Lebesgue almost everywhere. 
But the joint distribution function is right-continuous, so this condition de- 
termines the value of F[X < x,Y < y) everywhere. D 

Using the construction in Lemma ESI Bickel and Freedman (1981) establish 
that if Xi and X2 are independent and Yi and Y2 are independent, then 

4(Xi + X2, Yi + Y2) < 4(Xi, Fi) + dl{X2, Y2). (4) 

Similar subadditive expressions arise in the proof of convergence of Fisher 
information in Johnson and Barron (2004). By focusing on the case r = 2 
in Definition 11.11 and by using the theory of L^ spaces and projections, we 
establish parallels with the Fisher information argument. 

We prove Equation (JH) below, and further consider the case of equality 
in this relation. Major (1978, p. 504) gives an equivalent construction to 
that given in Lemma 12. HI If Fy is a continuous distribution function, then 
Fy{Y) ~ f/(0, 1), so we generate Y* ~ Fy and take X* = F^^ o Fy(F*). 
Recall that if EX = /i and VarX = cr^, we write D{X) for dl{X, Z^^„2). 

Proposition 2.4 If Xi, X2 are independent, with finite variances cr^,cr| > 
0, then for any t G (0, 1), 

D (yiXi + VT^X2^ < tD{Xi) + (1 - t)D{X2), 

with equality if and only if Xi and X2 are normal. 

Proof We consider bounding D[Xi + X2) for independent Xi and X2 with 
mean zero, since the general result follows on translation and rescaling. 

We generate independent Y* ~ X(0, af), and take X* = F^^o^^2(Y*) = 
hi(Y*), say, for i = 1,2. Further, writing cr^ = cr^+ag, we define Y* = Y*+Y2 
and set X* = F'l^^^ o ^^2{Y* + Y*) = h{Y{ + Y*), say. Then 

dl{Xi+X2,Y^ + Y2) = E{X*-Y*f 

< E{x* + x; - Y{ - Y^y 

= E(X* - Y{f + E(X* - Y^f 
= dl{X^,Y^) + dl{X2,Y2). 

6 



Equality holds if and only if {X^ + Xg, Y^ + Y2) has the same distribution 
as {X*,Y*). By our construction of Y* = Y{' + Y2, this means that (Xj" + 
X2*, Fi* + Y^) has the same distribution as (X*, Y{ + Y^), so F{X^ + X^ = 
h{Y* + Y*)} = P{X* = h{Y{ + Y*)} = 1. Thus, if equality holds, then 

hi{Y*) + h2{Y;) = h{Y{ + Y*) almost surely. (5) 

Brown (1982) and Johnson and Barron (2004), showed that equality holds 
in Equation Q if and only if h, hi, /i2 are linear. In particular. Proposition 
2.1 of (Johnson and Barron 2004) implies that there exist constants a, and 
bi such that 



E{hiY* + Y*) - hiY*) - h^iY*)} 

> - 

[a( + (T2 



^^^ [E{hiY*) - aiY* -hy + EihiY^) - a^Y^ - hf] .(6) 



Hence, if Equation © holds, then hiiu) = aiU + bi almost everywhere. Since 
Y* and X* have the same mean and variance, it follows that a^ = 1, 6j = 0. 
Hence hi{u) = h2{u) = u and X* =Y*. D 

Recall that T^ = 5*2*:, where S^ = (Xi + . . . +Xn)/ ^/n is a normalised sum of 
independent and identically distributed random variables of mean zero and 
finite variance a^. 

Proposition 2.5 There exists a strictly increasing sequence {k,) G M and a 
random variable T such that 

limD(TfcJ=D(T). 

r— >oo 

// T^ and T' are independent copies of T^^ and T respectively, then 
..,Z,(T,,0^.i,„Z,(^)^Z,(I±f 

Proof Since Var (T^) = 1 for all fc, the sequence (T^) is tight. Therefore, 
by Prohorov's theorem, there exists a strictly increasing sequence {kr) and a 
random variable T such that 

Tfc. ^ T (7) 



as r ^ oo. Moreover, the proof of Lemma 5.2 of Brown (1982) shows that 
the sequence (T^^) is uniformly integrable. But this, combined with Equation 
([7j) imphes that \im.r-,ood2{Tk^,T) = (Bickel and Freedman 1981, Lemma 
8.3(b)). Hence 

D(T,„) = dl{n^,Z^.) < {d2{Tk^,T)+d2{T,Z,2)y ^ dl{T,Z,2) = D{T) 

as r -^ oo. Similarly, dl{T,Z„2) < {rf2(T, T^J + d2{Tk^, Z^2)}'^, yielding the 
opposite inequality. This proves the first part of the proposition. 

For the second part, it suffices to observe that T^^ + T^^ ^ T + T' as 
r — i> oo, and ]E(Tfc,, + T^^)^ — ^ ]E(T + T'Y, and then use the same argument 
as in the ffist part of the proposition. D 

Combining Propositions 12.41 and 12.51 as described in Section Q the proof of 
Theorem 11.21 is now complete. 



3 Convergence of dr for general r 

The subadditive inequality (0)) arises in part from a moment inequality; that 
is, if Xi and X2 are independent with mean zero, then E|Xi + X2I'' < 
E|Xi|''' + E|X2|'^, for r = 2. Similar results imply that for r > 2, we have 
lim^^oo dr{Sn, Z„2) = 0. First, we prove the following lemma: 

Lemma 3.1 Consider independent random variables Vi,V2, . . . and Wi, W2, ■ ■ 
where for some r > 2 and for all i, E|Vi|'' < 00 and KlWil"^ < 00. Then for 
any m, there exists a constant c{r) such that 

f m y m X r/2 \ 

< c{r)lJ2d;iv,,w,) + (Y;^dliv,,w,)] . 



Proof We consider independent Ui ~ t/(0, 1), and set V* = Fy^(Ui) and 
W* = F^\Ui). Then 

m 

< E ^(v^; - w*) 

i=l 

as required. This final fine is an apphcation of Rosenthal's inequality (Petrov 
1995, Theorem 2.9) to the sequence iy* - W*). D 

Using Lemma f3.H we establish the following theorem. 

Theorem 3.2 Let Xi, X2, . . .he independent and identically distributed ran- 
dom variables with mean zero, variance a^ > and E|Xi|^ < 00 for some 
r > 2. // ^„ = (Xi + . . . + Xn)/^/n, then 

lim dr{Sn, Z„2) = 0. 

n— >oo 

Proof Theorem E21 covers the case of r = 2, so need only consider r > 2. We 
use a scaled version of Lemma l3.1l twice. First, we use Vi = Xj, Wi ~ X(0, a"^) 
and m = n, in order to deduce that, by monotonicity of the r-norms: 

< (5„, z^2) < dr) y-'/Xix^, z^2) + dl{x,, z^^y/^} 

< c{r){n'~'-/^ + l)d:{Xr,Z^2), 

so that d^ {Sn, Z„2) is uniformly bounded in n, by -ft', say. Then, for general 
n, define X = [v^' ^^^^ '^ ~ [^/^l) a^nd u = n — {m — 1)N < N. In 
Lemma f3.H take 






-^(i-i)Af+i + • • • + XiN, for z = 1, . . . , m - 1 



X, 



(m-l)Af+l 



+ . . . + X„, 



and Wi ~ X(0, Xa^) for i = 1, . . . , m - 1, PV^ ~ X(0, ua^) independently. 
Now the uniform bound above gives, on rescaling, 

d'^V^, Wi) = N'''^dl{SN, Z„2) < N'/^K for i = 1, ... m - 1 



and dl{V^, Wm) = u'/'^d;{Su, Z,^) < Wl'^K. Further dliy.,, Wi) = NdHS^, Z„2) 
for i = 1, ... m — 1 and dliVm, Wm) = udl{Su, Z„2) < Ndl{Si, Z„2). Hence, 
using Lemma f3. II again, we obtain 



driSn, Z^2 
1 



< 



d:{V, + ... + Vm,W, + ... + Wm) 
m / m 




r/2' 



.i=l 



mf^ / N(m -I) N 

< c{r) { mK—-^ + -^ UjiSr,, Z„.) + -dl{S,, Z„ 



r/2- 



nFi'^ \ n n 



- ^(^)il 7TZ7^+ {dliSN,Z^2) + -dl{Si,Z^2 

[m — vyi'^ \ m — 1 



r/2' 



This converges to zero since hm„_^oo d^iSfq, Z^^) = 0. D 



4 Strengthening subadditivity 

Under certain conditions, we obtain a rate for the convergence in Theorem 
11.21 Equation (jT]) shows that DiT^) is decreasing. Since DiTk) is bounded 
below, the difference sequence D{Tk) — D{Tk+i) converges to zero. As in 
Johnson and Barron (2004) we examine this difference sequence, to show 
that its convergence imphes convergence of D{Tk) to zero. 

Further, in the spirit of Johnson and Barron (2004), we hope that if the 
difference sequence is small, then equality 'nearly' holds in Equation (0), and 
so the functions h, hi, /12 are 'nearly' linear. This implies that if Gov (X, Y) 
is close to its maximum, then X is be close to h{Y) in the L^ sense. 

Following del Barrio, et al. (1999), we define a new distance quantity 
D*{X) = inf„,,2 dl{X, Zm,s^). Notice that D{X) = 2a^ - 2ak < 2a^, where 
k = J Fj^^{x)^~^{x)dx. This follows since F^^ and $~^ are increasing 
functions, so A; > by Chebyshev's rearrangement lemma. Using results of 
del Barrio et al. (1999), it follows that 

D*{X)=a'-e = D{X)-^^, 
10 



and convergence of D{Sn) to zero is equivalent to convergence of D*{Sn) to 
zero. 

Proposition 4.1 Let Xi and X2 be independent and identically distributed 
random variables with mean fi, variance a^ > and densities (with respect to 
Lebesgue measure). Defining g{u) = $^)^2 o -^(Xi+X2)/V2('")' ^/ ^^^ derivative 
g'{u) > c for all u then 

Proof As before, translation invariance allows us to take EXj = 0. For 
random variables X, F , we consider the difference term Equation (jHl) and 
write g{u) = Fy^ o Fx{u), and h(u) = g^^{u). The function k{x,y) = — {a; — 
h{y)Y is quasi-monotone and induces the measure dfik{x,y) = 2h'{y)dxdy. 
Taking Hi{x,y) = F{X < x,Y < y) and H2{x,y) = min{Fx(x), Fy(?/)} in 
Lemma [2.21 implies that 

E{X-h{Y)Y = 2jh'{y){H2{x,y)-H^{x,y)}dxdy, 

since E{X* — h(Y*)}^ = 0. By assumption h'{y) < 1/c, so 

E{X - h{Y)y < - {Gov {X*, ¥*) - Gov {X, Y))}. 

Again take Y{,Y^ independent N{0,a'^) and set X* = F^l o Fy^iY*) = 
hi{Y*). Then define Y* = Y{ + Y* and take X* = F^^+^J ° FY^+Y^iY*)- 
Then there exist a and b such that 

dliXi, Fi) + 4(X2, Y2) - 4(Xi + X2, Fi + Y2) 

= E(x* + X* - Y* - r;)^ - E(x* - r*)2 

= 2Gov (X* , r * ) - 2Gov (X* + X* , Y* + F; ) 

> cE{x* + X* - /i(ri* + Y;)y 

= cK{hi{Y*) + h2{Y*) - h{Y* + r;)}2 

> cE{hi{Y*) - aY* - by > cD*{Xi), 

where the penultimate inequality follows by Equation ©. Recall that -D(X) < 
2^2, so that D*{X) = D{X) - D(X)V(4(t2) > D{X)/2. The result follows 
on rescaling. D 

11 



We briefly discuss the strength of the condition imposed. If X has mean 
zero, distribution function Fx and continuous density fx, define the scale 
invariant quantity 

C(X) = inf($-oF.)'(.)= inf ^^^^^T = inf a%^iM. 

We want to understand when C{X) > 0. 

Example 4.2 If X ^ f/(0, 1), then C{X) = 1/^/12 sup^0(a;) = y^^. 

Lemma 4.3 // X has mean zero and variance a"^ then C(X)^ < a^ j(a'^ + 
median(X)^). 

Proof By the Mean Value Inequality, for all p 

so that 

a' + F^\l/2r= f F^\pfdp + F^\l/2f= f {F^\p) - F^\ll2)Ydp 
Jo Jo 

1 /-^ .. .o. a^ 

C{Xf' 

U 



^w/*-'^'^* 



In general we are concerned with the rate at which fx{x) ^ at the edges 
of the support. 

Lemma 4.4 If for some e > 0, 

fx{Fx\p))^c{l~pf-' asp^l (8) 

then \ira.p^i fx{F^^{j)))/(f){^^^{p)) = oo. Correspondingly if 

fx{F^\p)) ^ cp'-' as p ^ (9) 

then \imp^ofx{Fj,\p))/(l){<^'\p)) = oo. 

12 



Proof Simply note that by the Mills ratio (Shorack and Wellner 1986, p. 850) 

as X — s> oo, $(x) ~ (j){x)/x, so that as p — *> 1, 0($~^(p)) ~ (1 — p)^^^{p) ~ 

(i-p)v/-2iog(i-p). n 

Example 4.5 

i. r/ie density of the n-fold convolution of f/(0, 1) random variables is 
given by fx{x) = x^~^/{n — l)\ forO < x < I, hence F^^(p) = {nlpY^"", 
and fx{Fx^{p)) = ^/(n!)^/^"-^)/", so that Equation (0j holds. 

2. For an Exp(l) random variable, fx{F^^{p)) = 1—p, so that Equation 
^ fails andC{X) = 0. 

To obtain bounds on D{Sn) as n — ;► cxd, we need to control the sequence 
C{Sn)- Motivated by properties of the (seemingly related) Poincare constant, 
we conjecture that C{{Xi+ X2) / \/2) > C{Xi) for independent and identically 
distributed Xi. If this is true and C{X) = c then C(S'„) > c for all n. 

Assuming that C(5'„) > c for all n, note that D{Tk) < (1 -c/4)^L)(Xi) < 
(1-c/4)^(2ct2). Now 

D(Tfc+i) < D{Tk)il - c/2) (1 + ""^^^'^ 



8a2(l-c/2) 
so 



00 



t\\ 8<^'(l - <;/2) / - \t;8<T2(l-c/2)( - '\l-c/2 
We deduce that 



D{n) < D{X,) exp [y^^] (1 - c/2) > 
or D{Sn) = 0{n^), where t = log2(l - c/2). 

Remark 4.6 /n general, convergence of d^lSn, Z^^i) cannot occur at a rate 
faster than 0{l/n). This follows because KS^ = 3a'^ + 'y{Xi)/n, where 'y{X), 



13 



the excess kurtosis, is defined by 'j{X) = KX^ — 3(EX^)^ (when EX = 0). 
Thus by Minkowski's inequality, 

= 3^/V 



A + 7(x)y/^ 


-1 


_3^/V7(X)| 1 ^ 


( 1 


V n J 




An 


\n'^ 



Motivated by this remark, and by analogy with the rates discovered in John- 
son and Barron (2004), we conjecture that the true rate of convergence is 
D{Sn) = 0{l/n). To obtain this, we would need to control 1 — C{Sn)- 



5 Convergence to stable distributions 

We now consider convergence to other stable distributions. Gnedenko and 
Kolmogorov (1954) review classical results of this kind. We say that Y is 
a-stable if, when Yi, . . . y„ are independent copies of Y, we have (Yi + . . . + 
Yn — hn)/n^^°' ~ Y for some sequence (6„). Note that a-stable variables only 
exist for < a < 2; we assume for the rest of this Section that a < 2. 

Definition 5.1 If X has a distribution function of the form 

T? ( \ ci + hx{x) 

Px{x) = — forx<0 

1 17 f \ C2 + bx{x) 

1 — Fvix) = for X > 

X'-'' — 

where bx{x) — *> as x — ^ ±oo, then we say that X is in the domain of normal 
attraction of some stable Y with tail parameters ci,C2- 

Theorem 5 of Section 35 of (Gnedenko and Kolmogorov 1954) shows that if 
Fx is of this form, there exist a sequence (a„) and an a-stable distribution 
function Fy, determined by the parameters a, Ci, C2, such that 

Xi + . . . + Xn — an d „ , s 

Although Equation (fTUj) is obviously very similar to the standard Central 
Limit Theorem, one important distinguishing feature is that both ElXl"" and 
ElFl"" are infinite for < a < 2. 

14 



We use the following moment bounds from von Balir and Esseen (1965). 
If Xi, X2, . . . are independent, then 

n 

E|Xi + ... + X„|^ < ^^XiY forO<r<l (11) 

n 

E|Xi + ... + X,r < 2yE|X,r whenEXi = 0, .^2) 

^^-^ tor 1 < r < 2. 

Now, using ideas of Stout (1979), we show that for a subset of the domain 
of normal attraction, dp{X, Y) < 00, for some {3 > a. 

Definition 5.2 We say that a random variable is in the domain of strong 
normal attraction ofY if the function bx{x) from Definition \5.1\ satisfies 

bxix) < ^, 



for some constant C and some 7 > 0. 

Cramer (1963) shows that such random variables have an Edgeworth-style 
expansion, and thus convergence to Y occurs. However, his proof requires 
some involved analysis and use of characteristic functions. See also Mijn- 
heer (1984) and Mijnheer (1986), which use bounds based on the quantile 
transformation described above. 

We can regard Definition 15.21 as being analogous to requiring a bounded 
(2 + 5)th moment in the Central Limit Theorem, which allows an explicit rate 
of convergence (via the Berry-Esseen theorem). We now show the relevance 
of Definition 15.21 to the problem of stable convergence. 

Lemma 5.3 If X is in the domain of strong normal attraction of an a-stable 
random variable Y , then d^i^X, Y) < 00 for some {3 > a. 

Proof We show that Major's construction always gives a joint distribution 
{X\W*) with E|X* - W*\f^ < 00, and hence dp{X,W) < 00. Following 
Stout (1979), define a random variable W by 

^{W >x) = C2X-° if X > (2C2)^/". 
^{W<x) = ci|a;r°if x< -(2ci)i/". 
F{W e [-(2ci)i/",(2c2)^/"]) = 0. 
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Then for w > 1/2, F^\w) = {02/(1 - w)y/'^, and so for a; > 0, 

Now, since bx{x) — > 0, there exists K such that if x > i^ then bx{x) > — C2/2. 
By the Mean Value Inequality, if t > —1/2, then 

u|2i+i/« 
|l_(l + t)-i/"|< 1^1^ 



a 
so that for x > K 

\x-F-'Fxix)\ < ^^^. 

Thus, if X is in the strong domain of attraction, then 

/ \x-F^^Fx{x)\^dFx{x)<( / \xf^~''^dFx{x). 

J\x\>K \ Q?C2 / J\x\>K 

Hence (i^(X, W) is finite for all /? if 7 > 1 and for [3 < a/(l — 7), if 7 < 1. 
Moreover, Mijnheer (1986, Equation (2.2)) shows that if Y is a-stable, then 

as X -^ cxD, 

F(y>x) = ^ + o(^). 

and so Y is in its own domain of strong normal attraction. Thus using the 
construction above, dj3{Y, W) is finite for all /3 if a > 1 and for /3 < a /{I — a) 
otherwise. 

Recall that the triangle inequality holds, for dp or rf^, according as /5 > 1 
or /3 < 1. Hence (i/3(X, F) is finite for all (3 if min(a,7) > 1 and for (3 < 
a/(l — min(a, 7)) otherwise. D 

Note that for random variables Xi in the same strong domain of normal 
attraction, di3{Xi,Y) may be bounded in terms of the function bxXx)- In 
particular if there exist C, 7 such that bxXx) < C/\x\'^ then supj d^i^Xi, Y) < 
00, so the hypothesis of Theorem 11.31 is satisfied. 
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Proof of Theorem 1 1.31 We use the bounds provided by Equations (jllj) and 
(jT^. We consider independent pairs {X*,Y*) having the joint distribution 
that achieves the infimum in Definition ll.il Then by rescahng we have that 



1 ^ 



i=l 



ly^t-yn <z^E^i^*-^^ 



1=1 



We deduce that in the case of identical variables, dp^Sn.Y) (and hence 
da{SniY)) converges at rate 0{n^^^^^/°'). D 

We now combine Theorem 11.31 and Lemma 15.31 to obtain a rate of con- 
vergence for identical variables. Note that Theorem 11.31 requires us to take 
P < 2. Overall then we deduce that da{Sn,Y) converges at rate 0{n~^), 
where 

1. if min(a, 7) > 1, we take /3 = 2, and hence t = 1/a — 1/2; 

2. if min(Q;,7) < 1, we may take (3 = min[a/{l — min(Q;,7) + e},2] for 
any e > 0, and then t = min(l/a; — 1/2, 1 — e, 7/0; — e). 

Theorem 13.21 implies that if dr{Sn, 2^2) ever becomes finite, then it tends 
to zero, the counterpart of the following result. 

Theorem 5.4 Fix a G (0,2), let Xi,X2,... be independent random vari- 
ables (where EXj = 0, if a > 1), and let Sn = {Xi + . . . + Xn)/n^^'^ ■ Suppose 
there exists an a-stable random variable Y and Yi,Y2, . . . having the same 
distribution as Y, and satisfying 

1 " 

-^E{|Xi-ri|"l(|Xi-F,| > b)} ^0 ash ^00. (13) 

i=l 

If a 7^ 1 then lim„^oo da{Sn, Y) = 0, and ifa = l then there exists a sequence 
Cn = n~'^ Yh=i ^i^i - ^i) s^tc/i that lim„_,oo d^iSn - c„, Y) = 0. 

Proof (Suggested by an anonymous referee). Fix e > 0. Suppose first that 
1 < a < 2 and let di = E(Xj - Fj). Note that di = ii a > 1. Let 6 > and 
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define 



Ui = (X,-r,)l(|X,-F,|<6)-E{(X,-F,)l(|X,-r,|<6)} 
V, = (X,-F,)l(|X,-F,|>6)-E{(X,-y,)l(|X,-F,|>6)}. 
Then by Equation (fT^ . 



n 



oct—l 

< E 

2a-l 






i=l 






E^.+E^-' 



i=l 



i=l 



< 



n 
2«-i 



E«^ 

El^f/, 



-E 



n 



n \ 2^ a/2 ryn n 



i=l 



+ — Ve|v^,| 



i=l 



i=l 



a/2 22a-l Jl 



< — E^^n + — y.^{\x^ - >^.ri(i^. - Y^\ > b)} 



1=1 



22a-l 



+ —^ Y. [^{i^^ - y^m^^ - n > mr 



j=i 



na—luci 92a "■ 



n^ 



n 



1=1 



The result follows on choosing h sufficiently large to control the second term, 
and then n sufficiently large to control the first. 

For < a < 1, take f/j as before, take Vi = (Xj — Fj)l(|Xj — Yi\ > h) and 
Qi = E{(Xi - F,)l(|Xi - Yi\ < b)}. Now using Equation ^, 

n n n a 



1, 



dl{S^.Y) < -E 



j=i 



i=l j=l 



< -E 
n 



i=l 



E«. ^T^Y^y +^E 



i=l 



1 
+ - 

n 



1=1 



< 



If/" \ 2^ a/2 . n ,„ 

^ ^ i=l ' ^ i=l 



SO again since h is arbitrary, the result follows. 



n 



Note when Xi,X2, ... are identically distributed, the Lindeberg condition 
(fT!?|) reduces to the requirement that (iQ,(Xi, F) < 00. 
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